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Implementation  of  Monte  Carlo  Tree  Search 
(MCTS)  Algorithm  in  COMBATXXI  using 

JDAFS 

Chapter  1 
Introduction 


Background 

The  implementation  of  the  Monte  Carlo  Tree  Search  (MCTS)  algorithm  into  the  Com¬ 
bined  Arms  Analysis  Tool  for  the  21st  Century  (COMBATXXI)  project  is  an  extension  of 
work  completed  in  FY13d  The  TRADOC  Analysis  Center  -  Methods  and  Research  Office 
(TRAC-MRO)  sponsored  this  iteration  in  an  attempt  to  test  the  feasibility  implementing 
the  algorithm  into  the  COMBATXXI  simulation  environment.  For  further  details  on  the 
specihc  algorithm  of  more  background  information  see  appendix  B  and  the  previous  technical 
report.^ 


Problem  Statement 

To  execute  the  Monte  Carlo  Tree  Search  (MCTS)  algorithm  for  autonomous 
decision-making  agents  within  COMBATXXI. 


Issue  1:  Integration  of  algorithm  using  JDAFS. 

EEA  1.1:  Will  the  algorithm  execute  using  JDAFS  to  build  state  space? 

EEA  1.2;  What  changes  to  the  algorithm  are  necessary  for  integration  into  COM¬ 
BATXXI? 

Issue  2:  COMBATXXI  Interface. 

EEA  2.1:  Does  the  algorithm  interface  with  COMBATXXI? 

EEA  2.2;  How  does  the  algorithm  change  the  force  on  force  decisions  in  COMBAT¬ 
XXI? 


^See  MAJ  Christopher  Marks  et  al.  Mission  Command  Analysis  Using  Monte  Carlo  Tree  Search.  Tech.  rep. 
TRAC-M-TR-13-050.  700  Dyer  Road  Monterey,  California  93943:  TRADOC  Analysis  Center  -  Monterey, 
2013.  URL:  "https : //ako . hq. tradoc . army.mil/ sites/trac/MTRY/SitePages/Home . aspx" . 

^See  ibid. 
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Constraints,  Limitations,  &  Assumptions 

•  Constraints^ 

—  Complete  by  30  JUN  14. 

•  Limitations^ 

—  We  will  carry  out  all  experimentation  in  simulation  environments 
for  which  we  can  obtain  programmer  support. 

—  Rewards  functions  will  be  defined  by  the  study  team. 

•  Assumptions^ 

—  Implementation  will  not  require  modification  of  COMBATXXI. 

—  Algorithm  will  operate  outside  of  COMBATXXI. 

—  Will  require  link  between  COMBATXXI  and  the  algorithm. 

—  Algorithm  will  use  JDAFS  to  populate  state-space. 

—  Contractor  support  for  coding  will  be  available. 


^Constraints  limit  the  project  team’s  options  to  conduct  the  research. 

^Limitations  are  a  project  team’s  inabilities  to  investigate  issues  within  the  sponsor’s  bounds. 
^Assumptions  are  research-specific  statements  that  are  taken  as  true  in  the  absence  of  facts. 
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Chapter  2 

Analysis  and  Methodology 


In  this  chapter,  we  examine  the  methodology  in  which  we  approach  the  problem.  Our 
methodology  included  four  steps: 


1.  Dehne  the  Problem. 

•  Literature  review. 

•  Assemble  the  team. 

2.  Scenario  Development. 

•  Create  prototype  use-case  in  JDAFS. 

•  Expand  use-case  to  implement  using  JDAFS. 

3.  Test  and  Evaluation. 

•  Integrate  the  use-case  in  COMBAT  XXL 

•  Test  use-case  against  base-case. 

4.  Analysis 

•  Documentation  of  implementation. 


All  steps  which  were  completed  are  expanded  in  the  following  sections.  After  further  ex¬ 
ploration  (See  Project  Termination  Section)  ,  this  methodology  was  abbreviated  and  half  of 
step  2  and  all  of  step  3  were  not  completed. 

Define  the  Problem 

We  conducted  the  literature  review  and  assembled  the  team  during  the  initial  phase  of  the 
project.  The  literature  review  consisted  of  the  materials  referenced  in  the  previous  technical 
report®  along  with  additional  materials^  to  become  familiar  with  the  algorithm. 

®See  Marks  et  al.,  Mission  Command  Analysis  Using  Monte  Carlo  Tree  Search,  op.  cit. 

^see  Mark  HM  Winands,  Yngvi  Bjornsson,  and  Jahn-Takeshi  Saito.  “Monte-carlo  tree  search  solver”.  In: 
Computers  and  Carnes.  Springer,  2008,  pp.  25-36. 
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The  project  team  initially  held  meetings  with  TRAC- White  Sands  Missile  Range  (TRAC- 
WSMR)®  to  understand  the  stakeholders  concerns.  We  also  met  with  the  sponsor®  for  ap¬ 
proval  of  the  restated  problem  statement  and  technical  approach. 

Scenario  Development 

Prototype 

The  scenario  development  began  with  the  scenario  suggested  by  the  previous  tech  report.^® 
This  is  a  brief  overview  as  described  in  that  report. 


“Our  scenario  consists  of  a  group  of  “red”  riflemen  are  advancing  on  a  single 
“blue”  grenadier  in  a  hxed  position  (see  hgure  1).  Armed  with  an  assault  rifle 
(60  rounds)  and  a  grenade  launcher  (12  grenades),  the  grenadier  must  decide 
with  which  weapon  to  engage  the  enemy.  By  varying  the  number  of  red  troops 
advancing  on  the  blue  grenadier,  we  can  use  the  results  of  MCTS  runs  to  gain 
insight  into  what  situations  the  grenadier  should  choose  each  weapon  over  the 
other  in  order  to  get  the  best  effects,  and  when  it  might  be  best  to  switch 
weapons  during  engagements.  The  red  elements  will  execute  COMBATXXI 
default  behaviors,  i.e.,  they  will  engage  the  blue  entity  once  they  it  is  within 
range.  For  the  purpose  of  this  analysis,  the  advancing  red  troops  will  remain 
relatively  concentrated.  The  results  of  this  analysis  might  be  useful  in 
informing  the  CombatXXI  weapons  selection  behavior,  which  currently  simply 
selects  weapons  from  a  weapon-priority  list.”^^ 


®Mr.  Blane  Wilson  of  Mission  Command  and  Mr.  Dave  Ohman  of  Modeling  and  Simulation 
®Mr.  Paul  Works  of  the  Methods  and  Research  Office  (MRO) 

^'^See  Marks  et  ah,  Mission  Command  Analysis  Using  Monte  Carlo  Tree  Seareh,  op.  cit.,  pp.  54-56. 
^^See  ibid.,  p.  54. 
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Figure  2.  MCTS  Implementation  Strategy 


Implementation  Strategy 

Once  we  had  our  initial  prototype  we  discussed  implementation  strategy.  COMBATXXI  is 
a  high  resolution  simulation  in  which  a  small  scenario  can  be  computationally  expensive. 
With  this  consideration  in  mind  we  re-examined  our  fundamental  objective  of  evaluating 
mission  command  within  the  scenario.  In  order  to  run  the  scenario  multiple  times  to  build 
the  state  space  we  must  consider  the  computational  cost. 

Our  approach  (see  hgure  2)  creates  predetermined  criteria  for  decisions  within  a  COMBAT¬ 
XXI  scenario.  When  these  conditions  are  met  the  scenario  stops.  The  current  entity  state 
and  scenario  conditions  are  transfered  into  JDAFS,  a  low  resolution  simulation  which  is  not 
computationally  expensive.  The  MCTS  algorithm  runs,  building  the  state  spaces  in  the  tree 
using  the  JDAFS  simulation. 

Once  the  stopping  criteria  within  the  algorithm  is  reached  and  a  decision  is  made,  the  decision 
is  passed  back  to  COMBATXXI  for  execution  by  the  entity.  COMBATXXI  continues  the 
simulation  executing  the  decision  until  completion  or  stopping  criteria  is  met  again. 
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Implementation  Considerations 


To  implement  this  strategy,  we  identified  the  entity  and  scenario  variables  which  were  nec¬ 
essary  to  transfer  to  JDAFS,  considering  the  limitation  of  the  simulator.  We  first  identified 
what  is  important  to  the  scenario  and  what  can  be  left  out.  We  began  framing  the  answer 
but  quickly  went  outside  the  capacity  of  what  the  team  knew  of  the  current  programming 
codes  of  the  simulations. 

Realizing  our  lack  of  expertise  in  programming,  it  was  clear  there  was  a  contract  requirement 
for  programming  support  for  a  programmer  familiar  with  the  languages  used  in  JDAFS  and 
COMBATXXI.  We  worked  with  the  NFS  contracting  cell  to  announce  the  requirement  which 
became  a  lengthy  process^^ 


Project  Termination 

While  going  through  the  contracting  process,  we  decided  not  to  pursue  a  contract  effort 
because  even  if  we  did,  the  cost  in  terms  of  computing  time  would  have  been  prohibitive 
to  the  point  of  non-use.  This  was  identified  to  the  sponsor^^  and  the  lead  stakeholder^"^ 
during  an  In-Progress- Review.  They  agreed  to  terminate  the  project  because  of  the  fore-seen 
limitations  to  implementing  the  algorithm  in  COMBATXXI  due  to  prohibitive  computing 
time  even  if  completed. 

Test  and  Evaluation 

This  phase  was  not  completed  due  to  the  decision  to  not  go  forward  with  contracted  pro¬ 
gramming  support. 

Analysis 

This  project  is  documented  in  this  technical  report. 


^^More  than  6  months  before  the  performance  work  statement  was  posted. 
^^Mr.  Paul  Works  of  the  Methods  and  Research  Office  (MRO) 

^^Mr.  Chad  Mullis,  Director,  Models  and  Simulation,  TRAC-WSMR 


Chapter  3 
Conclusion 


The  MCTS  algorithm  has  merits  for  implementation  within  the  simulation  environment  as  an 
autonomous  decision  tool  to  aid  in  mission  command  analysis.  COMBATXXI,  in  its  current 
conhguration,  is  not  the  right  platform  for  MCTS  algorithm  implementation  as  concurred 
by  the  sponsor  and  lead  stakeholder.  The  hndings  of  this  project  support  implementing  the 
algorithm  using  a  low  resolution  simulation,  such  as  JDAFS,  to  hnd  the  “best”  decision  but 
implementing  this  algorithm  in  a  high  resolution  simulation  such  as  COMBATXXI  is  not 
prudent  at  this  time. 
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Project  Title:  Analyzing  the  Impact  of  Mission  Command  in  Simnlation  Envi¬ 
ronments 


Sponsor/Manager:  TRAC-MRO  (Mr.  Paul  Works) 


1  General  Information 


Government  Lead: 

MAJ  Christopher  Marks,  TRAC-Monterey,  ATTN:  ATRC-RDM,  700  Dyer  Road,  Monterey, 
CA  93943,  831-656-3751  (DSN  756-3751),  FAX  831-656-3084,  cemarks@nps.edu. 

Resource  Management  POC: 

MAJ  Edward  Masotti,  TRAC-Monterey,  ATTN:  ATRC-RDM,  700  Dyer  Road,  Monterey, 
CA  93943.  831-656-6271  (DSN  756-6271),  FAX  831-656-3084  emmasott@nps.edu. 

Technical  POC: 


ETC  Jon  Alt,  Director,  TRAC-Monterey,  ATTN:  ATRC-RDM,  700  Dyer  Road,  Monterey, 
CA  93943.  831-656-3086  (DSN  756-3086),  FAX  831-656-3084jkalt@nps.edu. 

Project  Objective:  To  demonstrate  analysis  of  mission  command  in  military  simulation 
environments  using  Monte  Carlo  Tree  Search  and  other  methods  from  artihcial  intelligence. 

Background:  Mission  Command  is  the  exercise  of  authority  and  direction  by  the  comman¬ 
der  using  mission  orders  to  enable  disciplined  initiative  within  the  commander’s  intent  to 
empower  agile  and  adaptive  leaders  in  the  conduct  of  unihed  land  operations  [1].  Military 
simulation  environments  for  analysis  represent  entity  decision-making  to  varying  degrees, 
but  typically  limit  decision-making  to  a  set  of  hrst  order  logic  rules.  This  makes  it  difficult 
to  conduct  analysis  to  understand  the  value  of  a  decision  in  a  given  situation.  TRAC-MTRY 
is  wrapping  up  a  six- month  effort  that  applied  Monte  Carlo  Tree  Search  (MCTS),  an  A1 
method  for  identifying  good  paths  through  a  decision  space,  to  address  several  military 
decision  making  situations: 


•  Mission  area  assignment  and  scheduling  for  aerial  platforms. 

•  Weapons  selection  decisions  in  COMBATXXI. 

•  Subordinate  element  assignments  in  the  Joint  Dynamic  Allocation  of  Fires  and  Sensors 
(JDAFS)  simulation  environment. 


The  output  of  this  initial  effort  will  be  several  simple  MCTS  implementations,  along  with 
analysis  and  documentation  of  the  results.  Based  on  these  results,  the  team  recommends 
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further  research  to  apply  MCTS  methods  into  more  complex  scenarios  in  order  to  conduct 
more  relevant  analyses. 

Technical  Approach:  This  follow-on  research  effort  will  begin  with  detailed  problem  deh- 
nition  work  with  the  sponsor.  Following  the  problem  dehnition  phase,  the  project  team  will 
develop  simulation  use-cases  with  analysis  goals  oriented  on  mission  command  relevant  to 
TRAC  studies  and  research.  These  use-cases  will  include  specihc  implementations  of  MCTS 
or  a  related  AI  method  and  will  leverage  initial  implementation  work  from  the  previous 
year’s  effort  conducted  in  COMBATXXI.  The  team  will  develop  and  execute  a  design  of 
experiments  for  each  use-case  and  will  provide  documentation  of  all  results  to  the  sponsor 
and  a  recommendation  on  the  use  of  these  techniques  to  enable  mission  command  analysis. 


2  Milestones  and  Deliverables 


Schedule: 

N  Receipt  of  funds 

N-l-1  Problem  dehnition;  initial  IPR  to  sponsor. 

N-l-3  Use-cases  &  analysis  goals  identihed,  IPR  to  sponsor. 

N-l-6  Use-case  scenario  hies  and  MCTS  implementations  complete,  IPR  to  sponsor. 
N-l-9  Experimentation  complete,  hnal  IPR  to  sponsor. 

N-l-10  Documentation  complete. 

Deliverables: 


•  Deliverable  1.  Use  case  scenario  hies  and  updated  MCTS  implementations. 

•  Deliverable  2.  Design  of  experiments  with  results. 

•  Deliverable  3.  Documentation  of  all  analyses  and  recommendations  for  analysis  of 
mission  command  in  simulation  environments. 


3  Project  Funding  Information 


Total  Funds:  $135,000. 

$90,000  NPS  faculty 

$45,000  TRAC-WSMR  Developer 
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Threat  Density  Map  Modeling  for  Combat  Simulations 


Francisco  R.  Baez 
Christian  J.  Darken 

Modeling,  Virtual  Environments,  and  Simulation  (MOVES)  Institute 
Naval  Postgraduate  School 
700  Dyer  Road,  Watkins  Ext.  265 
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831-656-7582 

frbaezto@nps.edu,  cjdarken@nps.edu 
Keywords: 

Threat  Modeling,  Combat  Simulations,  Probability  Theory 

ABSTRACT:  The  modeling  and  simulation  community  has  used  probability  threat  maps  and  other  similar 
approaches  to  address  search  problems  and  improve  decision-making.  Probability  threat  maps  describe  the 
probability  of  a  location  containing  one  or  more  enemy  entities  at  a  particular  time.  Although  useful,  they  only 
describe  the  likelihood  that  the  location  is  occupied  without  addressing  the  degree  to  which  it  is  occupied.  Thus,  we 
investigate  whether  threat  density  maps  that  describe  the  searcher’s  expectation  of  seeing  a  number  of  target  agents 
at  a  certain  location  in  a  given  time  interval  are  a  viable  method  for  improving  synthetic  behaviors  in  combat 
simulations.  As  a  proof  of  principle,  this  paper  introduces  a  probability  model  which  quantifies  the  searcher  agent’s 
subjective  belief  about  the  number  of  enemy  entities  in  a  location,  given  the  initial  information  described  by  a  prior 
density  function  and  the  information  provided  by  the  assumed  sensing  model.  In  addition,  this  paper  discusses  a 
framework  for  initializing  the  model,  as  well  as  the  model’s  key  advantages  and  current  limitations. 


1,  Introduction 

A  probability  threat  map  is  a  knowledge  representation 
of  the  search  environment  as  a  discrete  probability 
distribution,  which  provides  a  snapshot  in  time  of 
unobserved  threat  locations.  More  specifically, 
probability  threat  maps  are  models  of  the  perceived 
threat  that  describe  the  probability  that  any  given  one 
of  a  number  of  unseen  entities  that  are  moving 
independently  is  in  a  location  (Darken,  McCue,  & 
Guerrero,  2010).  They  have  been  applied  successfully 
to  drive  the  synthetic  behaviors  for  target  scanning  in 
military  training  simulations  by  prioritizing  locations 
that  are  most  likely  to  contain  targets  (Darken  et  al., 
2010;  Evangelista,  Ruck,  Balogh,  &  Darken,  201 1). 

Probability  threat  maps  are  derivatives  of  probabilistic 
occupancy  maps  used  by  game  developers  for 
opponent  and  target  tracking  (Isla  &  Blumberg,  2002; 
Isla,  2006);  in  addition,  they  use  methods  and 
techniques  originally  developed  for  mobile  robotics 
designed  to  improve  localization,  search,  navigation, 
and  decision-making  behaviors  (Elfes,  1989;  Thrun 
2003).  Others  analogous  approaches  have  been  applied 
to  investigate  search  problems  with  incomplete  and 
uncertain  information  using  unmanned  aerial  sensors 
and  autonomous  ground  sensors  (Bertucelli  &  How, 
2005,  2006;  Chung  &  Burdick,  2008,  2012;  Chung, 
Kress,  &  Royset,  2009;  Kagan  &  Ben  Gal,  2013). 

Existing  probability  threat  maps  approaches  for 
military  simulations  (Darken  et  al.,  2010;  Evangelista 


et  al.,  2011)  provide  simulated  entities  with  subjective 
knowledge  of  likely  enemy  locations  over  a  defined 
area,  which  is  then  used  to  carry  out  search  decisions 
and  search  behaviors  (e.g.,  select  the  next  search  area, 
modify  movement,  change  tactical  formations,  path 
planning).  These  methods  successfully  improved  the 
representation  of  search  based  on  situational  awareness 
and  environmental  factors  in  military  simulations. 
However,  there  is  a  stated  need  and  interest  for 
expanding  these  methods  essentially  to  enhance  the 
representation  of  search,  reasoning,  and  decision¬ 
making  behaviors  in  combat  simulations. 

We  believe  that  the  current  implementation  of 
probability  threat  maps  could  be  augmented  with 
additional  subjective  knowledge  of  the  threat  necessary 
to  model  and  simulate  combat  scenarios.  Probability 
threat  maps  use  statistical  description  of  likely  enemy 
locations  but  lack  the  ability  to  infer  the  number  of  the 
enemy  from  observed  data  and  prior  information. 
Ideally,  the  searcher  should  gain  whatever  information 
he  can  during  the  search  process  and  then  assess  his 
subjective  belief  to  infer  the  likely  disposition  (i.e. 
location  and  number  of  entities)  of  the  threat. 

A  threat  density  map  is  a  knowledge  representation  of 
the  expected  number  of  the  enemy  entities  located 
inside  each  subdivision  of  the  simulated  area.  More 
specifically,  it  quantifies  the  searcher  agent’s 
expectation  of  finding  a  number  of  enemy  entities  at  a 
particular  location  in  a  time  interval.  The  purpose 
threat  density  maps  is  to  augment  combat  simulations 
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with  actionable  subjective  knowledge  that  can  be 
exploited  by  the  simulated  entities  for  reasoning  and 
decision-making  in  response  to  the  threat  and 
environment  circumstances. 

In  contrast  to  probability  threat  maps,  threat  density 
maps  provide  the  searcher  agent  with  additional  data 
needed  in  combat  simulated  scenarios  to  make  better 
decisions  amongst  different  courses  of  action 
consistent  with  the  situation  presented  by  the  enemy 
forces  (Pew  &  Mavor,  1998).  For  instance,  depending 
on  the  size  of  the  enemy  forces  the  searcher  agent  can 
decide  whether  to  defend,  assault,  attack,  withdraw, 
avoid  combat,  or  bypass.  Such  decisions  would  control 
other  behaviors  such  as  searching  techniques,  path 
planning,  patrolling  strategies,  etc.  In  this  context, 
simulated  entities  would  have  additional  threat 
knowledge  to  reason  and  act  upon. 

In  this  paper,  we  introduce  a  threat  density  map  model 
as  a  proof  of  principle.  We  build  on  current  probability 
threat  maps  approaches  to  model  the  searcher’s 
subjective  belief  regarding  the  threat  size  as  a  posterior 
density  map  instead  of  a  discrete  probability 
distribution.  The  main  contribution  of  this  paper  is  the 
formulation  of  the  proposed  threat  density  map  model 
for  combat  simulations.  This  introductory  section  is 
followed,  in  Section  2,  with  a  description  of  the 
problem  and  the  model  formulation.  Section  3 
describes  the  advantages  and  limitations  of  the  current 
state  of  the  model.  Section  4  provides  concluding 
remarks  and  discusses  the  direction  of  the  future 
research. 

2,  Problem  Description  and  Formulation 

A  threat  density  map,  tm,  is  a  random  variable  defined 
over  a  finite  set  of  locations,  X,  which  assigns  a  score 
to  each  individual  cell  Xi  £  X,  i  =  1, C,  at  a  certain 
time  step  t  describing  the  expected  number  of  enemy 
entities  in  each  cell.  The  set  of  locations,  X,  represents 
the  area  of  operations  discretized  into  a  two- 
dimensional  grid  comprising  C  total  cells,  which  can 
either  be  unoccupied  or  occupied  by  one  or  more 
enemy  entities.  The  random  variable 
tm  =  ...  ituio)  denotes  the  state  of  the  threat 

density  map,  where  the  random  variable  trrij  indicates 
the  number  of  enemy  entities  in  cell  Xj.  Let  k  £  Z'*'  be 
the  grand  total  number  of  enemy  entities  across  all  cells 
in  X,  namely,  k  =  Xf=i  ftrij. 

Our  fundamental  problem  is  to  infer  the  unknown 
value  of  trrij,  namely  the  unknown  number  of  enemy 
entities  located  in  the  individual  cells,  based  on  a 
sequence  of  sensing  outcomes  and  assumptions  about 
the  success  of  those  sensing  actions.  To  accomplish 
this,  we  first  initialize  each  cell  with  a  prior  density 
function,  p (trrij),  based  on  how  the  searcher  believes 


the  enemy  is  spatially  distributed  and  the  certainty  of 
prior  information  available.  This  prior  information  is 
then  combined  with  the  data  from  our  assumed  sensing 
model,  p(sf Itrrij),  which  is  the  probability  density 
function  of  the  number  of  enemy  entities  sensed  in  cell 
x■^  at  time  step  t,  sf,  conditional  on  trrij.  Finally,  for 
each  individual  cell  we  update  the  prior  p  (trrij)  to  the 
posterior,  p(tmj|5f),  with  the  data  from  the  sensing 
model,  p(sf|tmj),  and  infer  the  expected  number  of 
enemy  entities  through  successive  Bayesian  updates. 

It  is  important  to  define  key  assumptions  required  for 
our  framework.  First,  the  total  number  of  enemy 
entities  in  the  set  of  locations  is  a  priori  unknown  but 
bounded  by  k  enemy  entities.  Second,  the  spatial 
distribution  of  the  enemy  entities  across  the  set  of 
locations  can  be  represented  with  a  prior  density 
function.  Third,  the  number  of  enemy  entities  in  any 
given  cell  is  independent  of  the  number  of  entities  in 
all  other  cells.  Lastly,  sensing  actions  within  the  same 
cell  are  conditionally  independent  from  other  sensing 
actions  whether  in  the  same  cell  or  in  other  cells. 
Clearly,  the  assumptions  of  independence  and 
conditional  independence  may  not  be  realistic  as  the 
knowledge  that  a  cell  is  occupied  or  not  at  a  particular 
time  can  help  figure  out  the  state  of  it  and  other  cells  at 
the  current  and  future  times.  However,  these 
assumptions,  commonly  used  in  related  literature, 
reduce  computational  complexities  and  allow  us  to 
decompose  the  problem  for  solving  threat  density  maps 
for  the  individual  cells  independently  (Thrun,  2003; 
Merali  &  Barfoot,  2013). 

2.1.  Initializing  Threat  Density  Maps 

To  initialize  tnij  at  time  step  t  =  0,  we  choose  a  prior 
density  function,  p(tmj),  for  every  cell  to  represent  the 
searcher’s  subjective  belief  about  the  enemy’s  spatial 
distribution  in  the  location  set  previous  to  initiating  the 
search.  This  prior  density  function  summarizes  the 
probability  that  the  random  variable  trrij  takes  on  any 
given  values  n,  which  we  can  write  explicitly  as 
p  (trrij  =  n). 

Defining  sensible  prior  density  functions  varies  by  the 
type  of  prior  information  (i.e.  specific,  vague, 
insufficient)  about  the  enemy  and  the  unknown 
parameter  trrij.  Information  regarding  the  enemy  (e.g., 
size,  composition,  known  or  suspected  locations,  likely 
formations  and  movement)  normally  exists  in  military 
scenarios  for  combat  simulations  and  should  be  used  to 
initialize  priors  for  each  cell.  Exact  or  credible 
intelligence  data  available  (e.g.,  intelligence  reports, 
situation  reports,  satellite  imagery)  of  the  enemy  and 
the  environment  can  be  useful  to  define  k  and  strong 
priors  and  perhaps  to  define  other  aspects  of  the  world 
(e.g.,  likely  movement  routes,  probable  employment 
areas,  key  terrain,  obstacles).  On  the  other  hand,  with 
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vague  intelligence  data  we  might  have  to  assume  a 
prior  based  on  general  considerations  (e.g.  most 
probable  or  most  dangerous  enemy  disposition). 

In  brief,  we  have  distinct  cases  of  prior  information 
available  (i.e.  specific,  vague,  and  no  prior  information 
available)  to  consider  when  specifying  p(tmi).  The 
inclusion  of  prior  information  into  the  prior  p(tmi)  is 
one  of  the  benefits  of  our  approach  because  it  leads  to 
stronger  inferences  about  trrij.  Regardless  of  the  level 
of  certainty,  we  can  specify  a  prior  to  quantify 
uncertainty  around  the  spatial  distribution  of  the  enemy 
entities  and  express  what  is  believed  or  known  about 
trrij  before  inspecting  any  cell  Xj  £  X.  Below  we 
discuss  a  discrete  prior  density  function  that  can  be 
used  to  initialize  the  pitnij)  with  prior  information. 


The  expected  value  of  the  random  variable  trrij  for  cell 
Xi  at  time  step  t  =  0  is: 

k 

=  E{tm,)  =  Y,ne[y)^=e[k  +)/)  (2) 

n=\ 

Although  the  choice  of  £  is  subjective  it  is  also  suitable 
to  initialize  p(tmi)  when  specific  prior  information  is 
available.  For  instance,  suppose  we  know  the  mean 
number  of  enemy  entities  for  some  specific  cells.  In 
this  case,  we  do  not  have  any  difficulty  incorporating 
this  information  in  p(tmj).  We  simply  solve  Eq.  (2)  for 
e,  i.e.,  £  =  2p/{k  +  1),  for  e  £  [0,1],  and  use  this 
value  to  define  the  prior  of  trrij  for  those  particular 
cells. 


2.1.1.  Discrete  Prior  Density  Function 


2.2.  Sensing  Model 


Consider  the  case  in  which  the  search  agent  lacks  or 
has  vague  prior  information.  A  common  practice  in 
such  situation  is  to  define  a  conventional  prior,  such  as 
the  discrete  uniform,  that  does  not  favor  any  particular 
value.  However,  as  previously  mentioned  prior 
information  for  combat  simulation  scenarios  is 
typically  available.  Therefore,  it  is  then  sensible  to 
define  a  prior  density  function  that  can  account  for  a 
broad  range  of  possibilities  fundamental  to  combat 
simulated  scenarios. 

For  this  particular  problem,  with  no  idea  about  the 
distribution  of  tirij  we  define  a  discrete  prior  assuming 
that  any  cell  in  X  could  contain  up  to  k  targets  evenly 
distributed  but  more  likely  for  the  enemy  to  be 
nonexistent  in  some  cells.  Accordingly,  tirij  is  a 
discrete  random  variable  with  a  finite  range  bounded 
by  k,  i.e.,  (0, 1,  Further,  we  assume  that  the 

prior  is  defined  for  each  cell  such  that  p(tmi)  =  1/k 
for  n  =  1,  ...,k.  However,  our  prior  subjective  belief 
inclines  us  to  anticipate  that  many  cells  will  be  empty 
rather  than  occupied  because  enemy  forces  tend  to 
cluster  together  whether  they  operate  as  cohesive  large 
element  or  as  smaller  dispersed  elements.  To  represent 
such  belief  we  define  the  parameter  e  such  that  each  of 
the  values  in  the  range  1  <  n  <  k  occurs  with 
probability  £(l/k)  and  (1  —  £)  for  n  =  0.  That  is,  the 
unconditional  prior  probability  distribution  for  an 
individual  cell  is  given  by  the  following  probability 
density  function: 


p{tm^  =  n 


)= 


iVA 

l-£, 

0, 


«  =  1,  2,  . . .,  k 
n  =  0 

otherwise 


(1) 


Sensing  actions,  namely,  observing  or  inspecting  cells, 
are  knowledge-producing  events  that  changes  the 
searcher’s  subjective  belief  of  the  threat.  The 
searcher’s  ability  to  observe  enemy  entities  in  a  cell  is 
modeled  using  the  combat  simulation’s  target  detection 
model,  which  specifies  the  probability  of  detecting  a 
target,  P^j,  as  a  function  of  the  brightness  of  the  target, 
the  brightness  of  the  target’s  background,  and  the 
subjective  size  of  the  target  given  that  one  or  more 
targets  are  present  in  the  location.  Although  P^j  varies 
by  type  of  target,  it  is  generally  constant  for  targets  of 
the  same  type  and  size,  and  against  a  particular 
background. 

In  our  framework,  sensing  actions  represent  binomial 
trials  with  (k  -f  1)  possible  outcomes  (i.e.  observing 
between  zero  and  k  enemy  entities)  of  the  actual 
number  of  entities  in  the  cell.  They  return  the  number 
of  enemy  entities  sensed,  sf,  in  cell  Xj  at  time  step  t. 
Therefore,  we  specify  a  binomial  sampling  model, 
p(sf|tiTij),  which  describes  the  searcher’s  ability  to 
gain  subjective  knowledge  regarding  tirij.  This 
sampling  model  provides  the  conditional  probability 
that  sf  is  b  conditioned  on  trui  and  given  P^j,  i.e., 
p(sf\tmi)  =  p{sf  =  b\tmi  =  n),  expressed  as 

p[s\  =  b\tm,  =  n)=  {^-Pd )"""  (3) 

for  b  =  0,1,  ...,k  and  0  <  P^j  <  1.  In  Eq.  (3)  the 
binomial  coefficient  n\/b\(n  —  b)\  describes  the 
number  of  combinations  of  n  things  taken  h  at  a  time 
without  regard  of  their  order;  (P^j)^  is  the  likelihood  of 
b  detections  given  P^;  and  (1  —  P^j)""'’  is  the 
probability  of  missing  (n  —  b)  of  the  possible 
detections. 
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2.3.  Multiple  Sensing  Actions 

Above  we  focused  on  the  probability  for  a  single 
sensing  action  at  time  step  t.  However,  our  goal  is  to 
infer  trrij  based  on  all  cell  inspections  through  time 
step  t.  Let  indicate  the  distribution  of  the 

sensing  outcomes  for  cell  Xi  up  to  time  step  t  and  let 
Sf  =  denote  the  history  of  the  number  of 

enemy  entities  sensed  through  time  step  t.  Assuming 
multiple  inspections  of  cell  Xi  at  different  time  steps, 
s^'',  j  =  represents  the  number  of  enemy 

entities  sensed  at  time  Tj. 

As  previously  stated,  the  probability  of  each  sensing 
action  is  conditionally  independent  of  other  sensing 
actions;  specifically,  sf  and  are  conditionally 

independent  given  trrij.  In  other  words,  if  tmi  is 
known,  additional  knowledge  of  does  not  change 
the  searcher’s  belief  about  how  many  enemy  entities  he 
will  see  at  the  next  observation  (sf).  Therefore,  the 
probability  of  the  data  set  (i.e.  history  of  the  enemy 
entities  sensed)  is  given  by: 

p(dj\tmi'j  =  =  (4) 

2.4.  Updating  Threat  Density  Maps 

We  now  discuss  how  to  update  probabilities  after  a 
new  sensing  action  is  performed.  According  to 
Bayesian  inference,  we  can  estimate  the  posterior 
through  time  step  t,  p(tmi\Sf'),  given  a  prior  on  trrij 
and  the  data  resulting  from  the  sensing  model, 
p(sf  Itrrij).  Applying  Bayes  rule  to  the  terms 
p(5f|tmj)  and  in  Eq.  (4)  and  with  the 

conditional  independence  assumption  of  the  sensing 
actions  results  in  the  posterior  density  function 
p(tmi\Sf)  of  trrij  given  the  history  of  enemy  entities 
sensed  in  cell  Xj  through  time  step  t.  The  posterior 
density  is  given  by 


p  (trrij  |5/)  = 


p(i'l|trrZj)p(trrrj|5/ 

p(^'|5/-i) 


(5) 


where  p(sf|  trrij)  is  obtained  from  the  sensing  model  in 
Eq.  (3);  p(tmj|5f“^)  represents  either  a  prior  at  time 
step  t  =  0,  i.e.,  p(tmj),  or  a  posterior  without  the  most 
recent  sensing  action  result;  and  p(sf|bf~^)  is  the 
normalization  factor  resulting  from  marginalizing  over 
trrij  and  applying  the  foregoing  conditional 
independence  assumption  of  sensing  actions  given  trrij : 


k 

p(^j|5j'“')=  ^p(^j|trrij  =  n)  p(trrij  =  rr|5j“^).  (6) 

«=0 


Substituting  Eq.  (6)  in  Eq.  (5),  the  individual  cell 
beliefs  can  be  updated  using  the  following: 

p{s‘\tm)p{tm\5‘r^) 

p{tm\5‘i)  =  - - .  (7) 

^p(^j|trrZj  =  rr)  p(tm^  = 

n=0 

Eq.  (7)  results  in  a  distribution  of  the  unknown  number 
of  enemy  forces  in  the  cell  conditioned  on  the  observed 
sample  data.  Thus  we  have  a  probability  model  that 
quantifies  the  searcher’s  new  state  of  subjective  belief 
about  trrij,  given  the  initial  information  described  by 
the  prior  p (trrij)  and  the  information  provided  by  the 
sensing  model  p(sf  | trrij). 

2.5.  Inference  about  the  Number  of  Enemy  Entities 

During  initialization  we  estimate  the  expected  number 
of  enemy  entities  for  every  cell  from  the  prior 
probabilities  and  maintain  this  during  runtime  until  the 
cell  posterior  distribution  is  updated  after  a  sensing 
action.  Once  the  posterior  is  computed,  we  utilize  Eq. 
(8)  to  determine  the  expected  number  of  entities  Xj. 

k 

E{tm^\5f)  =  '^^np{tm^  =  n\5().  (8) 

n=Q 

3,  Advantages  of  Threat  Density  Maps 

In  this  section  we  discuss  the  advantages  of  providing 
simulated  entities  with  threat  density  maps  as  well  as 
the  limitations  of  the  current  state  of  the  model.  Simply 
put,  the  main  advantage  of  the  proposed  approach  as 
compared  to  probability  threat  maps  is  that  a  threat 
density  map  provides  a  probability  distribution  of  the 
unknown  number  of  enemy  entities  and  the  expected 
number  of  enemy  entities  in  a  cell,  which  can  be 
influenced  by  a  detailed  prior  distribution.  To 
conceptualize  the  notion  of  threat  density  maps  applied 
to  combat  simulations  and  to  demonstrate  its 
practicality  and  advantages,  we  coded  and 
implemented  in  a  rudimentary  JavaScript  simulation 
the  aforementioned  threat  density  map  and  for 
comparison,  an  adaptation  of  the  probability  threat 
maps  (see  Appendix  1)  discussed  in  Darken  et  al. 
(2010). 

The  notional  scenario  consists  of  a  simulated  infantry 
soldier  (searcher)  searching  for  an  enemy  fireteam  to 
either  engage  them  or  to  report  their  disposition, 
location,  and  actions.  From  intelligence  data  the 
searcher  knows  that  enemy  fireteam  (targets)  is  not 
moving  and  consists  of  three  entities  close  together  and 
one  scout  far  ahead.  Figure  2(a)  shows  the  targets 
actual  distribution,  i.e.,  [x-^-^  =  1,  =  3},  which  is 
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unknown  to  the  searcher.  Based  on  their  doctrinal 
spatial  dispersion  and  the  size  of  a  cell  we  initialized 
threat  density  maps  for  the  individual  cells  assuming 
that  any  cell  could  contain  one  or  three  targets  but  not 
two  or  four,  yet  being  free  of  enemy  entities  is  even 
more  probable  than  occupation  by  one  or  three.  Figure 
1  shows  an  example  of  this  prior  for  a  single  cell. 
Finally,  we  assumed  a  uniform  prior  to  initialize  the 
probability  threat  map  and  the  probability  of  detection 
remained  constant  for  the  simulation,  i.e.,  =  0.65. 


Figure  1 :  Discrete  prior  distribution  of  for  cell 
with  £  =  0.4  assuming  that  it  is  more  likely  that  the 
cell  is  occupied  (containing  either  one  or  three  targets) 
than  empty. 

One  of  the  main  advantages  of  our  Bayesian  approach 
to  threat  density  maps  is  the  availability  of  a  posterior 
distribution  of  the  unknown  number  of  targets  in  a  cell 
rather  than  a  single  value  as  in  the  probability  threat 
map  approach.  For  example,  consider  the  situation 
shown  in  Figure  2  in  which  the  searcher  sensed  zero 
targets  after  inspecting  cell  x^.  The  low  probability 
value  in  the  probability  threat  map  [Figure  2(b)] 
indicates  that  cell  x^^  is  less  likely  to  contain  one  or 
more  targets  when  compared  to  the  other  cells. 
However,  the  searcher  lacks  knowledge  about  the 
degree  to  which  the  cell  x^  is  occupied,  when  in  fact  it 
can  be  empty  or  occupied  by  one  or  three  targets 
because  cell  inspections  are  not  perfect.  The  coarse 
threat  knowledge  provided  by  the  probability  threat 
map,  although  useful  for  search  decisions  is  not 
sufficient  for  making  decisions  related  to  tactical 
courses  of  action. 

On  the  other  hand,  the  threat  density  map  posterior 
distribution  summarizes  the  state  of  knowledge  about 
the  unknown  number  of  targets  in  the  cell  conditional 
on  the  prior  and  sensing  data.  In  contrast  to  the 
probability  threat  map,  the  threat  density  map  in  Figure 
2(d)  suggests  that  although  cell  x^  is  more  likely  to  be 
empty  there  is  still  a  chance  to  find  one  or  three  targets 
in  the  cell.  In  this  situation,  the  posterior  distribution  of 
tm-^  provides  the  searcher  with  a  more  accurate  picture 


of  the  likely  state  of  cell  x~^.  This  more  detailed 
representation  of  threat  knowledge  provides  the 
searcher  the  basis  for  a  more  confident  course  of  action 
selection. 


Figure  2:  Screenshot  of  the  simulated  scenario  at  time 
step  t  =  0.25  where  the  searcher  is  depicted  in  blue 
and  the  targets  are  depicted  in  red  (a),  the  probability 
threat  map  (b)  and  threat  density  map  consisting  of  the 
expected  number  of  targets  (c)  and  the  related 
probability  distributions  of  the  number  of  targets  (d). 

Consider  the  situation  in  Figure  3  in  which  the  searcher 
after  inspecting  several  cells  sensed  two  targets  (dark 
red  entities)  in  cell  For  such  situation,  it  would  be 
difficult  for  the  searcher  to  select  a  course  of  action 
that  provides  the  best  possibility  of  success  based 
solely  on  the  probability  threat  map.  Therefore,  it  is 
appealing  to  quantify  the  searcher’s  expectation  of 
finding  a  number  of  targets  at  the  cell.  Updating  the 
threat  density  map’s  prior  information  with  sensed 
data,  provides  interpretable  answers,  such  as  the  event 
that  equals  three  has  probability  of  one  [Figure 
3(d)]  thus,  the  searcher  could  expect  to  see  three  targets 
in  the  cell  [Figure  3(c)].  Then,  he  can  exploit  this 
subjective  knowledge  to  make  reasonable  decisions 
consistent  with  the  likely  state  of  the  threat,  for 
example,  decide  to  search  the  cell  for  the  unobserved 
target  or  to  move  out  of  the  cell  and  avoid  combat. 

Likewise,  threat  density  map  data  can  also  be  used  to 
support  reasoning.  Consider  a  separate  simulation  run 
(Figure  4)  in  which  the  searcher  sensed  one  target  (dark 
red  entity)  in  cell  given  =  0.9.  Based  on  the 
threat  density  map  the  searcher  could  assume  with  a 
high  degree  of  certainty  that  he  found  the  scout  entity 
of  the  enemy  fireteam  and  hence  could  use  this  belief 
for  identifying  the  neighboring  cell  that  could  contain 
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the  remaining  three  targets  and  to  determine  how  he 
deploys,  orient,  and  engages  the  remaining  targets. 


(c) 


Distribution  of  the  Number  of  Targets 


Figure  3:  Screenshot  of  the  scenario  and  the  state  of 
subjective  threat  knowledge  in  which  the  searcher 
sensed  two  targets  (depicted  in  dark  red)  in  cell 
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Figure  4:  Screenshot  of  the  scenario  and  the  state  of 
subjective  knowledge  in  which  the  searcher  sensed  one 
target  (depicted  in  dark  red)  in  cell 


3.1.  Integrating  Prior  Information 

The  incorporation  of  a  prior  density  function  for  trrii 
with  prior  information  is  the  final  favorable  feature  of 
the  threat  density  map  that  differentiates  it  from  the 
probability  threat  map.  As  previously  mentioned, 
intelligence  data  or  prior  information  is  typically 


available  for  combat  simulated  scenarios.  Regardless  of 
the  level  of  certainty  of  the  prior  information,  we  can 
use  the  aforementioned  discrete  prior  density  function 
or  other  suitable  discrete  distributions  to  describe 
uncertainty  for  trUj  in  a  mathematical  model.  However, 
from  a  modeling  perspective  the  difficulty  is  in  how  to 
effectively  integrate  prior  information  from  different 
sources  (e.g.,  intelligence,  doctrine,  environment)  using 
a  prior  density  function  (Blasco,  2007). 

In  Figure  1  above  we  already  demonstrated  an  example 
for  initializing  threat  density  maps  given  prior 
information  and  intelligence  data  (i.e.  the  total  number 
of  targets  and  their  tactical  formation).  Below  we 
briefly  discuss  two  cases  of  prior  information  available 
common  to  combat  simulated  scenarios  for  initializing 
threat  density  maps. 

First,  presume  that  the  prior  information  available 
consists  only  of  the  total  number  of  enemy  entities  (a 
fireteam  of  four  entities)  and  their  posture  (not  moving) 
but  neither  their  actual  location  nor  their  tactical 
formation  is  known.  In  this  situation  of  vague  prior 
information  is  sensible  to  assume  that  any  cell  could 
contain  up  to  four  enemy  entities  and  logically  we  can 
expect  that  many  cells  will  be  empty  instead  of 
occupied.  Accordingly,  we  could  set  the  value  of  e  to 
be  0.75  and  utilize  Eq.  (1)  to  initialize  threat  density 
maps  for  each  cell  Xi  £  X.  Figure  5  shows  the  prior 
distribution  for  cell  x-^. 


Figure  5:  Discrete  prior  distribution  of  for  cell  x-^ 
assuming  that  it  is  more  probable  to  be  empty  and 
equally  likely  to  be  occupied  by  at  least  one  and  no 
more  than  four  targets. 

The  plot  in  Figure  5  shows  that  it  is  more  likely  for  a 
cell  to  be  unoccupied  and  equally  possible  to  be 
occupied  by  one,  two,  three,  or  four  enemy  entities. 
The  expected  number  of  enemy  entities  in  each  cell  is 
1.875,  thus  the  searcher  can  expect  to  find 
approximately  two  enemy  entities  in  any  particular  cell 
at  the  next  time  step. 
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Second,  specific  prior  information  can  easily  be 
incorporated  through  the  prior  density  function.  For 
example,  suppose  that  from  the  most  current 
intelligence  data  available  it  is  known  that  there  is  a 
squad-size  element  in  a  linear  defense,  arrayed  from 
the  southwest  comer  of  the  area  of  operations  to  the 
northeast  comer,  heavily  concentrated  in  cell 
defending  the  southeast  sector  of  the  area  of  operations, 
as  depicted  in  Figure  6.  Incorporating  this  prior 
information  into  the  model  can  be  done  in  a  flexible 
manner  and  inferences  can  be  compared  under  different 
priors  in  order  to  choose  a  prior  that  characterizes  the 
most  likely  threat  situation.  One  alternative,  for 
example,  is  to  set  the  value  of  £  equal  to  one  for  the 
cells  known  to  be  occupied  and  zero  otherwise.  Such 
an  approach  can  be  efficient  but  it  does  not  account  for 
the  possibility  that  the  situation  could  change  before 
the  searcher  reaches  any  of  these  cells.  Therefore,  one 
could  select  other  values  of  £  for  the  cells  of  interest. 
Perhaps  another  alternative  is  to  deduce  from  doctrine 
and  terrain  data  the  maximum  number  of  enemy 
entities  that  can  occupy  a  cell  and  other  relevant  factors 
to  initialize  the  priors  for  each  cells  that  produce 
£'(t?n8  i4)  =  2,  =  5),  and  zero  otherwise. 
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Figure  6:  Screenshot  of  the  location  set  with  ground 
truth  data.  The  searcher  is  depicted  in  blue  and  the 
targets  in  a  linear  defense,  heavily  concentrated  in  cell 
are  depicted  in  red. 

3.2.  Current  Limitations  of  Threat  Density  Maps 

As  we  have  seen  in  the  previous  examples,  there  are 
significant  advantages  of  augmenting  combat  simulated 
scenarios  with  threat  density  maps  as  they  provide 
simulated  entities  with  actionable  subjective 
knowledge  to  make  course  of  action  decisions,  which 
in  turn  determines  other  search,  movement,  and  path 
planning  behaviors.  However,  the  proposed  approach 
has  some  fundamental  limitations.  While  the 
assumptions  of  independence  and  conditional 
independence,  described  in  Section  2,  allows  us  to 
solve  the  threat  density  maps  for  the  individual  cells, 
the  model  excludes  features  for  modeling  spatial 
dependencies  and  temporal  effects.  This  limitation  is 
evident  in  Figure  3(c)  and  3(d)  as  the  model  properly 
estimates  the  expected  number  of  enemy  entities  in  the 


cell,  i.e.  ^(tmig)  =  3.0,  essentially  due  to  the 
inclusion  of  prior  information  into  the  model;  however, 
it  fails  to  exploit  this  information  for  estimating  trrij  for 
the  other  cells. 

4,  Conclusions  and  Future  Directions 

In  this  paper  we  proposed  a  threat  modeling  approach 
for  estimating  the  number  of  the  enemy  entities  at  a 
certain  location  in  a  given  time  interval.  The  model 
estimates  the  expected  number  of  enemy  entities  as  a 
posterior  density  map,  can  be  initialized  with 
intelligence  reports  and  prior  information,  and  works 
for  any  number  of  enemy  entities  and  their  spatial 
distribution.  Although  a  threat  density  map  approach  is 
not  required  for  all  combat  simulation  models  and 
scenarios,  they  offer  several  important  advantages  over 
probability  threat  maps  that  make  them  suitable  for 
implementation  in  combat  simulations  for  improving 
the  representation  of  search,  reasoning,  and  decision¬ 
making  behaviors. 

Efforts  are  underway  to  introduce  probability 
distributions  that  can  model  threat  movement. 
Furthermore,  future  work  will  focus  on  addressing 
known  limitations  and  extending  the  proposed  model 
by  introducing  spatial  and  temporal  dependencies  and 
interactions,  and  developing  hierarchical  threat  density 
map  representations.  Finally,  we  plan  to  experiment 
with  and  characterize  the  utility  of  the  model  for 
improving  the  capabilities  of  simulated  entities  in  a 
combat  simulation  scenario  for  different  threat 
conditions. 
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Appendix  1:  Probability  Threat  Map 
Adaptation 

In  this  section  we  briefly  describe  our  basic  adaptation 
of  the  probability  threat  maps  approach  discussed  in 
Darken  et  al.  (2010). 

Let  Qi  be  the  conditional  probability  that  an  unseen 
enemy  entity  is  present  in  cell  Xi  G  X  and  after 
inspecting  cell  Xj,  where  Xi  4^  Xj,  qi  is  the  estimated 
probability  before  inspecting  cell  Xi,  and  is  the 
probability  of  detecting  a  target  (see  Section  2.2). 
According  to  the  axioms  of  probability  theory, 
0  <  <  1  and  the  total  probability  over  all  C  cells  is 

=  1.  Suppose  the  searcher  inspects  cell  Xj, 
assuming  that  cell  inspections  are  independent  of 
neighboring  cells,  then,  qi  takes  the  form 

where  the  term  f  is  an  indicator  function  that  equals  to 
zero  if  Xi  =  Xj  and  equals  to  one  otherwise. 
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Appendix  C 

Mission  Command  Analysis  Using  Monte  Carlo  Tree 

Search  in  JDAFS 
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Introduction 

The  TRAC  Methods  and  Researeh  Offiee  has  initiated  an  effort  to  improve 
analysis  methodology  for  military  operations  employing  network  enabled  mission 
command.  Representation  of  network  enabled  operations  has  improved  significantly  in 
military  simulation  models  over  the  past  decade  and  a  half;  however,  several  key 
challenges  remain.  Foremost,  is  the  need  to  rapidly  produce  relevant  analysis  accounting 
for  the  operational  effects  of  network-enabled  capabilities  supporting  mission  command. 

TRAC -Monterey  is  carrying  out  supporting  research  to  produce  a  documented 
and  tested  methodology  that  applies  Monte  Carlo  Tree  Search  methods  to  decision 
situations  in  order  to  expand  mission  command  oriented  analysis.  Mission  command 
features  decentralized  execution  with  subordinate  commanders  exercising  disciplined 
initiative  while  acting  aggressively  and  independently  to  accomplish  the  mission  within 
the  commander’s  intent.  The  methodology  will  improve  analysis  by  extending  data 
developed  from  operational  data,  wargames,  and  other  subject-matter  expertise  elicitation 
into  a  simulation  environment  where  more  extensive  and  rigorous  analysis  can  be 
accomplished. 

Scope  of  Work 

The  scope  this  work  was  to  design  an  implementation  of  MCTS  method  in  the 
Fires  Allocation  (FA)  scenario  in  the  Joint  Dynamic  Allocation  of  Fires  and  Sensors 
(JDAFS)  simulation  environment.  Programming  and  testing  of  the  implementation  was 
left  to  future  work. 

The  next  section  will  give  a  brief  overview  of  the  JDAFS  simulation,  and 
following  that,  a  brief  overview  of  the  Monte  Carlo  Tree  Search  algorithm.  Following 
that  will  be  a  description  of  the  design  for  adding  MCTS  to  JDAFS  for  fires  allocation. 
Finally,  next  steps  will  be  discussed. 
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Joint  Dynamic  Allocation  of  Fires  and  Sensors  (JDAFS) 


The  Joint  Dynamic  Allocation  of  Fires  and  Sensors  (JDAFS)  is  a  low-resolution, 
constructive  entity-level  simulation  framework  that  can  be  rapidly  configured  and 
executed  and  allows  force-on-force  analysis  of  differing  UAS  mixes.  This  model  can 
simulate  UAS  operations  with  optimization  in  the  loop  to  gain  greater  insight  into  UAS 
interactions  and  interactions  between  UASs  and  sensor  targets  for  a  given  NAT 

JDAFS  provides  the  ability  to  conduct  quick  operational  analyses  of  Joint  and 
Army  assets  by  proving  a  model  that  is  extremely  flexible,  configurable,  and  enables  an 
analyst  to  very  quickly  create  a  simulation  model  that  captures  the  first-order  effects  of  a 
scenario.  Currently,  JDAFS  represents  aircraft  schedules,  but  does  not  adequately 
represent  deconfiiction  of  air  assets.  JDAFS  is  also  ideally  well  suited  for  establishing 
Joint  Starting  Conditions  for  any  given  scenario.  JDAFS  provides  an  effective  simulation 
modeling  tool  to  provide  quick-turn  analysis  of  capability  to  compare  operational  policies 
and  control  measures  in  order  to  identify  those  polices  and  measures  which  provide  the 
greatest  operational  performance,  focusing  primarily  on  fires  and  effects. 

Fires  allocation 

A  shooter  platform  has  its  fires  directed  by  an  instance  of  a  Constrained  Value 
Optimizer  (CVO).  The  CVO  is  so-named  because  its  initial  implementation  is  to  assign 
shooters  to  targets  based  on  solving  an  optimization  problem.  The  default  CVO  in  JDAFS 
formulates  an  integer  linear  program  based  on  which  targets  have  been  detected  and 
which  shooters  are  available.  Since  a  shooter  may  have  several  types  of  munitions,  the 
assignments  are  based  on  the  properties  of  the  munition-target  pairings.  For  each  pair,  a 
coefficient  is  calculated  for  the  objective  coefficient  in  the  linear  program.  The  algorithm 
for  this  calculation  is  performed  in  an  instance  of  a  Value  of  Potential  Assignment  (VP A) 
class.  The  default  one  is  described  here,  but  the  software  has  been  written  so  that 
different  computations  could  be  made. 

The  default  VPA  calculates  the  expected  net  value  of  engaging  a  shooter  platform 
with  a  given  weapon  i  against  a  given  target  j:  —  y/X- where  vj  is  the  value  of  the 

shooter  platform  of  weapon  i,  py  is  the  probability  that  target  j  is  killed  by  munition  i,  v,  is 
the  value  of  the  shooter  platform  of  weapon  i  and  pji  is  the  probability  that  target  j  kills 
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the  shooter  of  munition  i  when  they  engage.  Thus,  the  optimization  problem  to  be  solved 
is; 

maxy^  c  X 

z— I  y  y 

‘J 

subject  to 

H^y  -4 

J 

i 

H^y- 

i 

Xy  £{0,1} 

where  Ai  is  the  maximum  number  of  targets  that  can  be  assigned  to  shooter  i,  Bj  is  the 
maximum  number  of  shooters  that  can  be  assigned  to  target  j,  and  C/  is  the  minimum 
number  of  shooters  that  must  be  assigned  to  target  j.  Note  that  the  formulation  is  totally 
unimodular,  and  therefore  the  linear  programming  relaxation  gives  the  optimal  solution. 


Monte  Carlo  Tree  Search  (MCTS) 

Monte  Carlo  Tree  Search  (MCTS)  is  a  method  for  finding  optimal  or  near-optimal 
solutions  by  using  Monte  Carlo  random  sampling  of  potential  decisions  and  building  a 
search  tree.  The  tree  is  used  to  evaluate  the  value  of  decisions,  informed  by  the  results  of 
exploring  various  branches  of  the  tree.  When  a  pre-determined  computational  budget  has 
been  reached,  the  algorithm  terminates  with  the  action  with  the  highest  value  is  selected 
as  the  next  “move.”  Each  node  of  the  tree  contains  the  (current)  value  of  the  node  and  the 
number  of  times  it  has  been  visited.  The  general  form  of  the  algorithm  proceeds  in  four 
steps  (Browne,  et  al,  2012): 

1 .  Selection.  A  child  node  that  is  expandable  (i.e.  is  a  non-terminal  state  and  has 
unvisited  children)  is  selected. 

2.  Expansion.  One  (or  more)  child  nodes  are  added  to  the  selected  node  based  on 
feasible  actions. 

3.  Simulation.  A  simulation  is  executed  from  each  new  node  according  to  the 
default  policy  (described  below)  to  produce  an  outcome. 
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4.  Backpropagation.  Each  parent  node  of  the  seleeted  one  is  updated  to  refleet 
the  outeome. 

Two  polieies  are  applied  that  eustomize  the  general  algorithm;  these  are; 

1 .  The  tree  policy  determines  whieh  node  is  seleeted  during  the  Seleetion  step. 

2.  The  default  policy  determines  how  the  game  is  played  out  from  a  given  node 


to  produee  a  value. 


The  general  MCTS  approaeh  ean  therefore  be  summarized  by  the  following  pseudo-eode 
(Browne,  et  al,  2012): 


ereate  root  node  vo  with  state  so 
while  within  eomputational  budget  do 
Vi  <-  TreePolicyiyf) 

A  <-  DefaultPolicy(s(vi)) 
Backup(vi,A) 
return  a(BestChild(vQ) 


The  Tree  Poliey  that  will  be  used  for  JDAFS  is  based  on  the  Upper  Confidence 
Bounds  for  Trees  (UCT)  algorithm  (Browne,  et  al,  2012).  This  seleets  the  best  ehild  vj  of 
node  V  based  on  the  following  formula: 


where  N(v)  is  the  number  of  times  node  v  has  been  visited,  Vjis  a  ehild  node  of  v,  and 
Q(v)  is  the  eurrent  value  of  node  v.  The  node  vj  with  the  largest  value  of  UCTj  is  the  one 
seleeted  next. 

Application  of  Monte  Carlo  Tree  Search  to  JDAFS 

The  logieal  plaee  in  JDAFS  to  apply  the  MCTS  algorithm  is  for  the  alloeation  of 
fires.  That  is,  to  develop  a  replacement  Constrained  Value  Optimizer  (CVO)  for  JDAFS 
based  on  the  MCTS  algorithm  deseribed  above. 

The  implementation  of  a  MCTS  CVO  would  neeessarily  take  into  aeeount  the 
subsequent  possible  behavior  of  the  model  following  the  alloeation.  Speeifioally,  the 
“simulation”  stage  of  MCTS  would  involve  replieation  s  of  simulating  the  subsequent 
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battle  given  a  partieular  alloeation.  This  has  the  potential  of  superior  alloeations,  since  the 
current  default  CVO  is  based  on  a  static  model. 


Implementing  the  simulation  stage  of  MCTS  involves  having  to  solve  several 
coding  issues.  First,  JDAFS  would  need  a  mechanism  to  save  the  current  “real”  state  of 
the  model  so  it  could  be  returned  to  following  the  output  of  the  MCTS  algorithm.  It 
would  need  to  be  saved  as  an  initial  state  for  the  simulation  replications  as  well.  Currently 
JDAFS  does  not  support  for  this  capability. 

Second,  a  means  to  restore  and  recover  the  state  of  the  individual  entities  would 
need  to  be  implemented,  since  the  MCTS  simulation  would  necessarily  be  modifying 
those  states. 

Third,  there  are  units  in  JDAFS  that  are  not  necessarily  detected,  and  so  would  not 
be  in  the  initial  allocation  list,  yet  would  impact  the  simulation  going  forward.  How  to 
incorporate  those  undetected  entities  is  a  problem  that  would  have  to  be  solved. 

Finally,  to  fully  take  into  account  uncertainties  about  the  enemy  force,  there 
would  have  to  be  a  mechanism  for  forecasting  what  (undetected)  enemy  forces  might 
exist  and  incorporating  such  pseudo-entities  into  the  simulation  stage  of  MCTS. 
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Appendix  E 
Glossary 


COMBATXXI 

EEA 

JDAFS 

MCTS 

MOVES 

NPS 

TRAC 

TRAC-MRO 

TRAC-MTRY 

TRAC-WSMR 

TRADOC 


Combined  Arms  Analysis  Tool  for  the  21st  Century 
Essential  Elements  of  Analysis 
Joint  Dynamic  Allocation  of  Fires  and  Sensors 
Monte  Carlo  Tree  Search 
Modeling,  Virtual  Environments,  and  Simulation 
Naval  Postgraduate  School 
Training  and  Doctrine  Command  Analysis  Center 
Training  and  Doctrine  Command  Analysis  Center  Meth¬ 
ods  and  Research  Office 

Training  and  Doctrine  Command  Analysis  Center — 
Monterey 

Training  and  Doctrine  Command  Analysis  Center — 
White  Sands  Missile  Range 
Training  and  Doctrine  Command 


E-1 


